Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

نویسندگان

  • Huamin Li
  • Yuval Kluger
  • Mark Tygert
چکیده

As illustrated via numerical experiments with an implementation in Spark (the popular platform for distributed computation), randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a full principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark; for instance, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Randomized Matrix Decompositions using R

The singular value decomposition (SVD) is among the most ubiquitous matrix factorizations. Specifically, it is a cornerstone algorithm for data analysis, dimensionality reduction and data compression. However, despite modern computer power, massive datasets pose a computational challenge for traditional SVD algorithms. We present the R package rsvd, which enables the fast computation of the SVD...

متن کامل

An implementation of a randomized algorithm for principal component analysis

Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks’ MATLAB, a popular software platform for numerical computation. As illustrated v...

متن کامل

High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU

Fast computation of singular value decomposition (SVD) is of great interest in various machine learning tasks. Recently, SVD methods based on randomized linear algebra have shown significant speedup in this regime. This paper attempts to further accelerate the computation by harnessing a modern computing architecture, namely graphics processing unit (GPU), with the goal of processing large-scal...

متن کامل

Lossy Color Image Compression Based on Singular Value Decomposition and GNU GZIP

In matrix algebra, the Singular value decomposition (SVD) is an factorization of complex matrix that has been applied to principal component analysis, canonical correlation in statistics, the determination of the low rank approximation of matrices. In this paper, using the SVD and the theory of low rank approximation of a matrix, we offer a new scheme for color image compression based on singul...

متن کامل

Exploratory factor and principal component analyses: some new aspects

Exploratory Factor Analysis (EFA) and Principal Component Analysis (PCA) are popular techniques for simplifying presentation of, and investigating structure of, an (n×p) data matrix. However, these fundamentally different techniques are frequently confused, and the differences between them are obscured, because they give similar results in some practical cases. We therefore investigate conditio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1612.08709  شماره 

صفحات  -

تاریخ انتشار 2016